Project Unsupervised Learning

Andres Delgadillo

1 Project: AllLife Bank Customer Segmentation

1.1 Objective

To identify different segments in the existing customer, based on their spending patterns as well as past interaction with the bank, using clustering algorithms, and provide recommendations to the bank on how to better market to and service these customers.

1.2 Data Dictionary

1.3 Questions to be answered

2 Import packages and turnoff warnings

3 Import dataset and quality of data

This first assessment of the dataset shows:

4 Exploratory Data Analysis

4.1 Pandas profiling report

We can get a first statistical and descriptive analysis using pandas_profiling

Pandas Profiling report is showing some warnings/characteristics in the data:

4.2 Univariate Analysis

4.3 Pairplot.

We are going to perform bivariate analysis to understand the relationship between the columns

4.4 Bivariate and Multivariate Analysis

Features with positive correlation:

Features with negative correlation

5 Data Pre-Processing

5.1 Feature Engineering

5.2 Outliers detection

We are going to analyze if there are outliers in the features

Avg_Credit_Limit

There are several points with average credit limit above the upper whisker 100000. However, those values could correspond to premium customers. Therefore, we are not going to clip those values

Total_Credit_Cards

There are not outliers

Total_visits_bank

There are not outliers

Total_visits_online

There are some points above the upper whisker. However, they could correspond to consumers that login more times online. Therefore, we are not going to clip those values

Total_calls_made

There are not outliers

5.3 Scaling data

6 K-means Clustering

6.1 Elbow curve

We are going to use the Elbow Curve method to identify the optimal number of clusters

6.2 Silhouette Score

Now, we are going to use the Silhouette Score to identify the optimal number of clusters

6.3 Customer profile K=3

K=3 is an appropriate number for clusters because it has the highest silhouette score and there is knick at 3 in the elbow curve

Average values for each feature

Box plots for each feature

7 Hierarchical Clustering

Now, we are going to use the Hierarchical Clustering method to group the customers

7.1 Cophenetic correlation

We are going to use the Cophenetic correlation to identify the appropriate distance metric and linkage method

7.2 Dendograms

Now, we are going to use the Dendograms to identify the proper number of clusters.

7.3 Customer profile

Average values for each feature

Box plots for each feature

8 K-means and Hierarchical Clustering comparison

Now, we are going to compare both techniques and the clusters found in both of them

8.1 Execution time comparison and number of clusters

8.2 Cluster comparison

Now, we are going to compare the clusters found in each technique

Both techniques determine almost the same clusters.

This customer seems more similar to Hierarchical cluster 2 than cluster 0. Therefore, Kmeans clustering performed better clustering for this specific customer

8.3 Insights about different customers

Now, we are going to plot the different characteristics of the customers

There are 3 different groups of customers with the following characteristics:

9 Actionable Insights & Recommendations